Convex Optimization 2015.09.16
Convex Optimization 2015.09.16
\(\min f_0(x)\)
\(s.t. f_i(x) \le b_i, i = 1 ... m\)
Here \(x \in \mathbb{R}^n, f_i : \mathbb{R}^n \rightarrow R, i = 0 ... m\) are convex function.
I.e. \(f_i(\theta x + (1 - \theta) y) \le \theta f(x) + (1 - \theta) f(y)\) for all \(\theta \in [0, 1]\) and all \(x, y \in dom f_i\)
where \(dom f_i \subseteq \mathbb{R}^n\) is a convex set
where \(C \subseteq \mathbb{R}^n\) is convex if and only if \(\theta x + (1 - \theta) y \in C\) for all \(x, y \in C\), for all \(\theta \in [0, 1]\)
Fact: Set of \(x\)s \(s.t. f_i(x) \le b\) is a convex set.
Proof: Take \(x, y, s.t. f_i(x) \le b_i, f_i(y) \le b_i\) and \(\theta \in [0, 1]\),
Then \(f_i(\theta x + (1 - \theta) y) \le \theta f_i(x) + (1 - \theta) f_i(y) \le \theta b_i + (1 - \theta) b_i = b_i\)
Fact: The function \(f: \mathbb{R} \rightarrow \mathbb{R}, f(x) = x^2\) is convex.
Proof: \((\theta x + (1 - \theta) y)^2 = \theta^2 + x^2 + 2\theta ( 1 - \theta) xy + (1 - \theta)^2 y^2\)
\(=[\theta + \theta^2 - \theta] x^2 + 2\theta (1 - \theta) xy + [(1 - \theta) + (1 - \theta)^2 - (1 - \theta)] y^2\)
\(=\theta x^2 + (1 - \theta) y^2 + \theta(\theta - 1) x^2 + 2 \theta (1 - \theta) xy + (1 - \theta)(- \theta) y^2\)
\(=\theta f(x) + (1 - \theta) f(y) + \theta (\theta - 1) (x - y)^2\)
\(\ge \theta f(x) + (1 - \theta) f(y)\)
All linear function is convex function.
Fact: Let \(A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m, f: \mathbb{R}^m\) be convex,
Then \(g:\mathbb{R}^n\) defined by \(g(x) = f(Ax + b)\) is convex.
Proof: Let \(x, y \in dom g, \theta \in [0, 1]\)
Then \(A(\theta x + (1 - \theta) y) + b = \theta (Ax + b) + (1 - \theta)(Ay + b) \in dom f\)
Take \(x, y \in dom g, \theta \in [0, 1]\)
Then \(g(\theta x + (1 - \theta) y) = f(\theta (Ax + b) + (1 - \theta)(Ay + b)) \le \theta f(Ax + b) + (1 - \theta) f(Ay + b) = \theta g(x) + (1 - \theta) g(y)\)
Fact: The sum of two convex functions is convex.
Note: Proof starts by nothing that if \(f\), \(g\) are convex functions, then \(dom(f + g) = dom f \cap dom g\) is convex, because intersection of convex sets is convex.
Example: The function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) defined by \(f(x) = \lVert Ax - b \rVert _2^2\) is convex.
Proof: \(\lVert Ax - b \rVert _2^2 = (a_1^T x - b_1)^2 + ... + (a_m^T x - b_m)^2\) is a sum of convex functions.
Example: \(\min \lVert Ax - b \rVert _2^2\) is a convex optimization problem. This kind of problem is known as a least-square problem.
Definition: Norms: A function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) is a norm if and only if
\(f(x) \ge 0\) for all \(x \in \mathbb{R}^n\) non-negetivity
\(f(x) = 0 \iff x = 0\) definiteness
\(f(t \cdot x) = \lvert t \rvert \cdot \lVert x \rVert\) for all \(t \in \mathbb{R}, x \in \mathbb{R}^n\) homogeneity
\(f(x + y) \le f(x) + f(y), \forall x, y \in \mathbb{R}^n\) triangle inequality
Example: For \(p \ge 1\), \(\lVert x \rVert _p = (\lvert x_1 \rvert ^p + ... + \lvert x_n \rvert ^p)^{\frac{1}{p}}\) is a norm
\(\lVert x \rVert _\infty = \max_{i = 1 ... n} \lvert x_i \rvert\)
Geometric Picture: The set \(B_1 = {x \in \mathbb{R}^n : \lVert x \rVert \le 1}\) is bounded, convex, centrally symmetric at 0, nonempty interior.
(If \(C \subseteq \mathbb{R}^n\) the interior \(C^0\) of \(C\) is the set \(\{x \in C : \{y : \lVert x - y \rVert _2 \lt r\} \subseteq C\), for some \(r \gt 0\}\))
Definition: We say that sequence \((Z_i)_{i = 1}^{\infty}\) converges to \(z\) under the norm \(\lVert \cdot \rVert\) if and only if for every \(\varepsilon \gt 0\) there exists \(N \in \mathbb{N}, s.t. \lVert z_i - z \rVert \lt \varepsilon\) for all \(i \ge N\)
Fact: In \(\mathbb{R}^n\), convergence happens in one norm if and only if it happens in any other norm.
Let \(\lVert \cdot \rVert\) be a norm in \(\mathbb{R}^n\),
Then \(\exists c_1, c_2 \gt 0\), such that \(c_1 \lVert x \rVert _1 \lt \lVert x \rVert \lt c_2 \lVert x \rVert _1\) for all \(x \in \mathbb{R}^n\)
Lemma 1: \(\lVert x - y \rVert \ge \lVert x \rVert - \lVert y \rVert\) for any norm Triangle inequality
Proof: \(\lVert x \rVert = \lVert y + (x - y) \rVert \le \lVert y \rVert + \lVert x - y \rVert\)
Lemma 2: The norm \(\lVert \cdot \rVert\) is continous with respect to \(\lVert \cdot \rVert _1\).
I.e., for any \(x \in \mathbb{R}^n\), for any \(\varepsilon \gt 0\), there exists \(\delta \gt 0\)
\(s.t. \lVert x - y \rVert _1 \lt \delta \implies \lvert \lVert x \rVert - \lVert y \rVert \rvert \lt \varepsilon\)
Proof: \(\lvert \lVert x \rVert - \lVert y \rVert \rvert \le \lvert \lVert x - y \rVert \rvert\)
and \(\lVert x - y \rVert = \lVert \sum_{i = 1}^n \vec{e_i} (x_i - y_i) \rVert\)
\(\le \sum_{i = 1}^n \lVert \vec{e_i} (x_i - y_i) \rVert = \sum_{i = 1}^n \lvert x_i - y_i \rvert \lVert \vec{e_i} \rVert\)
\(\le \max_i (\lVert \vec{e_i} \rVert) \cdot \sum_{i = 1}^n \lvert x_i - y_i \rvert = \max_i (\lVert \vec{e_i} \rVert) \cdot \lVert x - y \rVert _1\)
where \(\vec{e_1}, ..., \vec{e_n}\) is the standard basis
so if \(\delta = \frac{\varepsilon}{\max_i (\lVert \vec{e_i} \rVert)}\)
then \(\lvert x - y \rvert _1 \lt \delta \implies \lvert \lVert x \rVert - \lVert y \rVert \rvert \lt \epsilon\)
\(\lVert \cdot \rVert\) attains its minimum and maximum on the set \(\{x : \lVert x \rVert _1 = 1\}\) which is closed and bounded.
Let \(x_\min\) be such that \(\lVert x_\min \rVert \le \lVert x \rVert\) for all \(x \in \{x : \lVert x \rVert _1 \le 1\}\),
let \(x_\max\) be such that \(\lVert x_\max \rVert \ge \lVert x \rVert\) for all \(x \in \{x : \lVert x \rVert _1 \le 1\}\)
Let \(c_1 = \lVert x_\min \rVert \neq 0, c_2 = \lVert x_\max \rVert\) then for arbitrary \(x, x \neq 0\),
\(\lVert x \rVert = \lVert \frac{x}{\lVert x \rVert _1} \cdot \lVert x \rVert _1 \rVert = \lVert x \rVert _1 \cdot \lVert \frac{x}{\lVert x \rVert _1} \rVert \le \lVert x \rVert _1 \cdot \lVert x_\max \rVert = c_2 \cdot \lVert x \rVert _1\)
Similarity, \(\lVert x \rVert \ge c_1 \lVert x \rVert _1\)
Definition: A matrix \(A \in \mathbb{R}^{n \times n}\) is positive semidefinite, if \(x^TAx \ge 0, \forall x \in \mathbb{R}\), and \(A\) is symmetric.
Fact: Let \(A\) symmetric, \(A = P^TDP\) where \(P\) is orthogonal, \(D\) diagonal,
Then \(A\) is positive semidefinite iff \(D \ge 0\).
Proof: Say \(D = diag(\lambda_1, ..., \lambda_n), \lambda_i \lt 0\)
Let \(x = P^T\vec{e_i}\), Then \(x^TAx = (P^T\vec{e_i})^TP^TDPP^T\vec{e_i} = \vec{e_i}^TD\vec{e_i} = \lambda_i \lt 0\)
If \(D \ge 0\), then \(x^TAx = (x^TP^T)D(Px) \ge 0\)